AITopics

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Neural Information Processing SystemsFeb-12-2026, 06:47:39 GMT

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

Dongruo Zhou, Pan Xu, Quanquan Gu

Neural Information Processing Systems http://nips.cc/

algorithm, gradient, gradient complexity, (14 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
North America > Canada > Quebec > Montreal (0.04)
Europe > Russia (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsDec-23-2025, 17:50:56 GMT

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In the general convex and smooth setting, VRADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations, where $n$ is the number of samples; meanwhile, VRADA matches the lower bound of this setting up to a $\log\log n$ factor. In the strongly convex and smooth setting, VRADA matches the lower bound in the regime $n \le \Theta(\kappa)$, while it improves the rate in the regime $n\gg \kappa$ to $O\big(n +\frac{n\log(1/\epsilon)}{\log(n/\kappa)}\big)$, where $\kappa$ is the condition number. Besides improving the best known complexity results, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. Through experiments on real datasets, we show the good performance of VRADA over existing methods for large-scale machine learning problems.

accelerated dual averaging, name change, variance reduction, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-20-2025, 14:31:52 GMT

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

Dongruo Zhou, Pan Xu, Quanquan Gu

In this work, we mainly focus on first-order algorithms, which only need the function value and gradient evaluations.

algorithm, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
North America > Canada > Quebec > Montreal (0.04)
Europe > Russia (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsAug-19-2025, 23:18:08 GMT

A unified variance-reduced accelerated gradient method for convex optimization

Guanghui Lan, Zhize Li, Yi Zhou

We propose a novel randomized incremental gradient algorithm, namely, V Ariance-Reduced Accelerated Gradient ( Varag), for finite-sum optimization.

convergence, optimization, varag, (13 more...)

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Neural Information Processing SystemsOct-9-2024, 11:32:15 GMT

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In the general convex and smooth setting, VRADA can attain an O\big(\frac{1}{n}\big) -accurate solution in O(n\log\log n) number of stochastic gradient evaluations, where n is the number of samples; meanwhile, VRADA matches the lower bound of this setting up to a \log\log n factor. In the strongly convex and smooth setting, VRADA matches the lower bound in the regime n \le \Theta(\kappa), while it improves the rate in the regime n\gg \kappa to O\big(n \frac{n\log(1/\epsilon)}{\log(n/\kappa)}\big), where \kappa is the condition number. Besides improving the best known complexity results, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. Through experiments on real datasets, we show the good performance of VRADA over existing methods for large-scale machine learning problems.

accelerated dual averaging, finite-sum optimization, variance reduction, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Huang, Feihu, Zhao, Jianyu

Faster Adaptive Decentralized Learning Algorithms

arXiv.org Artificial IntelligenceAug-19-2024

Decentralized learning recently has received increasing attention in machine learning due to its advantages in implementation simplicity and system robustness, data privacy. Meanwhile, the adaptive gradient methods show superior performances in many machine learning tasks such as training neural networks. Although some works focus on studying decentralized optimization algorithms with adaptive learning rates, these adaptive decentralized algorithms still suffer from high sample complexity. To fill these gaps, we propose a class of faster adaptive decentralized algorithms (i.e., AdaMDOS and AdaMDOF) for distributed nonconvex stochastic and finite-sum optimization, respectively. Moreover, we provide a solid convergence analysis framework for our methods. In particular, we prove that our AdaMDOS obtains a near-optimal sample complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of nonconvex stochastic optimization. Meanwhile, our AdaMDOF obtains a near-optimal sample complexity of $O(\sqrt{n}\epsilon^{-2})$ for finding an $\epsilon$-stationary solution of nonconvex finite-sum optimization, where $n$ denotes the sample size. To the best of our knowledge, our AdaMDOF algorithm is the first adaptive decentralized algorithm for nonconvex finite-sum optimization. Some experimental results demonstrate efficiency of our algorithms.

algorithm, inequality, optimization, (12 more...)

2408.09775

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

arXiv.org Artificial IntelligenceJun-4-2024

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions

Jiang, Wei, Yang, Sifan, Wang, Yibo, Zhang, Lijun

This paper explores adaptive variance reduction methods for stochastic optimization based on the STORM technique. Existing adaptive extensions of STORM rely on strong assumptions like bounded gradients and bounded function values, or suffer an additional $\mathcal{O}(\log T)$ term in the convergence rate. To address these limitations, we introduce a novel adaptive STORM method that achieves an optimal convergence rate of $\mathcal{O}(T^{-1/3})$ for non-convex functions with our newly designed learning rate strategy. Compared with existing approaches, our method requires weaker assumptions and attains the optimal convergence rate without the additional $\mathcal{O}(\log T)$ term. We also extend the proposed technique to stochastic compositional optimization, obtaining the same optimal rate of $\mathcal{O}(T^{-1/3})$. Furthermore, we investigate the non-convex finite-sum problem and develop another innovative adaptive variance reduction method that achieves an optimal convergence rate of $\mathcal{O}(n^{1/4} T^{-1/2} )$, where $n$ represents the number of component functions. Numerical experiments across various tasks validate the effectiveness of our method.

algorithm, convergence rate, optimization, (14 more...)

2406.01959

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceFeb-4-2024

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-{\L}ojasiewicz Condition

Bai, Yunyan, Liu, Yuxing, Luo, Luo

This paper considers the optimization problem of the form $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$, where $f(\cdot)$ satisfies the Polyak--{\L}ojasiewicz (PL) condition with parameter $\mu$ and $\{f_i(\cdot)\}_{i=1}^n$ is $L$-mean-squared smooth. We show that any gradient method requires at least $\Omega(n+\kappa\sqrt{n}\log(1/\epsilon))$ incremental first-order oracle (IFO) calls to find an $\epsilon$-suboptimal solution, where $\kappa\triangleq L/\mu$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals $f_1(\cdot),\dots,f_n(\cdot)$ are located on a connected network of $n$ agents. We provide lower bounds of $\Omega(\kappa/\sqrt{\gamma}\,\log(1/\epsilon))$, $\Omega((\kappa+\tau\kappa/\sqrt{\gamma}\,)\log(1/\epsilon))$ and $\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where $\gamma\in(0,1]$ is the spectral gap of the mixing matrix associated with the network and~$\tau>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.

complexity, inequality, optimization, (16 more...)

2402.02569

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceOct-10-2022

Distributed stochastic proximal algorithm with random reshuffling for non-smooth finite-sum optimization

Jiang, Xia, Zeng, Xianlin, Sun, Jian, Chen, Jie, Xie, Lihua

The non-smooth finite-sum minimization is a fundamental problem in machine learning. This paper develops a distributed stochastic proximal-gradient algorithm with random reshuffling to solve the finite-sum minimization over time-varying multi-agent networks. The objective function is a sum of differentiable convex functions and non-smooth regularization. Each agent in the network updates local variables with a constant step-size by local information and cooperates to seek an optimal solution. We prove that local variable estimates generated by the proposed algorithm achieve consensus and are attracted to a neighborhood of the optimal solution in expectation with an $\mathcal{O}(\frac{1}{T}+\frac{1}{\sqrt{T}})$ convergence rate, where $T$ is the total number of iterations. Finally, some comparative simulations are provided to verify the convergence performance of the proposed algorithm.

algorithm, artificial intelligence, machine learning, (17 more...)

2111.0382

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Chongqing Province > Chongqing (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)